This script provides an EDA and some visualisation for the generated timeseries of the earthquakes dataset. The main parts of the routines were developed in previous courses at the University of London by the same author (Mohr, 2021, 2023, 2024a) and have been further refined to meet the needs of this MSc thesis project/research. However, the code has been updated to comply with the latest requirements and package interdependencies. Some comments will be included in this Jupyter Notebook, and the code contains several inline comments. For details on the project/research itself, refer to the appropriate document.
Mohr, S. (2021) Regional Spatial Clusters of Earthquakes at the Pacific Ring of Fire: Analysing Data from the USGS ANSS ComCat and Building Regional Spatial Clusters. DSM020, Python, examined coursework cw1. University of London.
Mohr, S. (2023) Clustering of Earthquakes on a Worldwide Scale with the Help of Big Data Machine Learning Methods. DSM010, Big Data, examined coursework cw2. University of London.
Mohr, S. (2024a) Comparing Different Tectonic Setups Considering Publicly Available Basic Earthquake’s Data. DSM050, Data Visualisation, examined coursework cw1. University of London.
250111 Generation of script, loading and formatting timeseries, timeseries statustics, plotting timseries,
technical basis (ACF, PACF, stationarity, differenciating)
250112 Plotting ACF and PACF (function), enhancing analysis for stationarity and differenciating,
function for stattionarirty analysis, moving all functions to shared_procedures.py,
first tests with crosscorrelation, refining CCF(ts1,ts2) and CCF(ts2,ts1), prepare workflow basics,
250113 TOPn CCF for documentation and comparison per function top_max_min_indices_as_dataframe,
adding information to figures, adding ci bands, combine everything in function calculate_ccf_results,
make functions for CCF, move all function to shared_procedures
250114 Rename 'size' to 'count', completely remake show_crosscorrelation_results by adding confidence intervalls,
re-organize workflow, group important analysis (so far) and clean the code and workbook, no need to make
a dictionary or something similar due to unessessary work (!)
250115 Added ACF, PACF monthly plots, use scatterplot and stem as an alternative, add more custom parameters,
seasonal component analysis for ActiveEruptions, adding first tests with FFT
250116 Adding information about timeseries continuity and ruptures, Fourier Analysis, FFT for ActiveEruptions,
xlimits for FFT figures, finalise FFT, improve stemplots for CCF
250117 Boxcox transformation, Lejung test, HET testing
250118 Make same workflow everywhere, add all data so far, add STFT analysis, show CCF coeff. for FFT
Highpower Freqs respective specific lags to compare them with the FFT analysis to decide on their
importantance (FFT combined with CCF), check complete notebook for succesfull processing and library
imports and make some cleanups using only really needed parts
250121 Switching to parameters.py, adding selectable cluster directory, changing all directories,
saving complete notebook as experimentation protocol to cluster directory, add scope and cluster to
figure titles, saving CCF figures to data_dir_cluster, save Timeseries Rupture Plot, check and drop
non-matching values for Box Cox transformations
250122 Clean everythin, show always the same timeseries, add titles and labels where possible and adequate,
calculate several CCF (see protocol), check everything, duplicate and adapt it for the GVP analysis, GOOD LUCK!
250123 Prepare everything for creating the figures and protocols (establish test protocol), run gvp_1000 analysis
250124 Update workflow for CCF, add all selected timeseries everwhere, cleanup everything again, make one single
50_* workflow script for all scopes, analysing scope gvp_1000 and studyarea_1000
==> scope studyarea_1000 and scope gvp_1000 are clean and results are saved!
./.
# which python installation and version are we using here?
print('\n******* Python Info ***********')
!which python
!python --version
# show some CPU and RAM info
print('\n******* CPU Info ***********')
!lscpu
print('\n******* RAM Info (in GB) ***********')
!free -g
******* Python Info ***********
/bin/python
Python 3.8.10
******* CPU Info ***********
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 4
NUMA node(s): 4
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6234 CPU @ 3.30GHz
Stepping: 7
CPU MHz: 1200.024
CPU max MHz: 4000.0000
CPU min MHz: 1200.0000
BogoMIPS: 6600.00
Virtualization: VT-x
L1d cache: 1 MiB
L1i cache: 1 MiB
L2 cache: 32 MiB
L3 cache: 99 MiB
NUMA node0 CPU(s): 0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60
NUMA node1 CPU(s): 1,5,9,13,17,21,25,29,33,37,41,45,49,53,57,61
NUMA node2 CPU(s): 2,6,10,14,18,22,26,30,34,38,42,46,50,54,58,6
2
NUMA node3 CPU(s): 3,7,11,15,19,23,27,31,35,39,43,47,51,55,59,6
3
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit: KVM: Mitigation: Split huge pages
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerabl
e
Vulnerability Retbleed: Mitigation; Enhanced IBRS
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disable
d via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __u
ser pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced IBRS; IBPB conditional;
RSB filling; PBRSB-eIBRS SW sequence; BHI V
ulnerable, KVM SW loop
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; TSX disabled
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep
mtrr pge mca cmov pat pse36 clflush dts acpi
mmx fxsr sse sse2 ss ht tm pbe syscall nx p
dpe1gb rdtscp lm constant_tsc art arch_perfm
on pebs bts rep_good nopl xtopology nonstop_
tsc cpuid aperfmperf pni pclmulqdq dtes64 mo
nitor ds_cpl vmx smx est tm2 ssse3 sdbg fma
cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
movbe popcnt tsc_deadline_timer aes xsave a
vx f16c rdrand lahf_lm abm 3dnowprefetch cpu
id_fault epb cat_l3 cdp_l3 invpcid_single in
tel_ppin ssbd mba ibrs ibpb stibp ibrs_enhan
ced tpr_shadow vnmi flexpriority ept vpid ep
t_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2
erms invpcid cqm mpx rdt_a avx512f avx512dq
rdseed adx smap clflushopt clwb intel_pt av
x512cd avx512bw avx512vl xsaveopt xsavec xge
tbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_to
tal cqm_mbm_local dtherm ida arat pln pts hw
p hwp_act_window hwp_epp hwp_pkg_req pku osp
ke avx512_vnni md_clear flush_l1d arch_capab
ilities
******* RAM Info (in GB) ***********
total used free shared buff/cache available
Mem: 251 46 193 0 11 203
Swap: 63 26 37
# show installed packages and versions
!pip freeze
absl-py==2.1.0 affine==2.4.0 aggdraw==1.3.16 arch==5.6.0 array-record==0.4.0 astropy==5.2.2 asttokens==2.4.1 astunparse==1.6.3 atomicwrites==1.1.5 attrs==19.3.0 Automat==0.8.0 backcall==0.2.0 beautifulsoup4==4.12.3 blinker==1.4 branca==0.8.1 cachetools==5.5.0 certifi==2019.11.28 cftime==1.6.4.post1 chardet==3.0.4 click==8.1.7 click-plugins==1.1.1 cligj==0.7.2 cloud-init==24.3.1 clustpy==0.0.1 colorama==0.4.3 comm==0.2.2 command-not-found==0.3 configobj==5.0.6 confluent-kafka==2.5.3 constantly==15.1.0 contextily==1.5.2 contourpy==1.1.1 cryptography==2.8 cupshelpers==1.0 cycler==0.10.0 dbus-python==1.2.16 debugpy==1.8.7 decorator==4.4.2 defer==1.0.6 distro==1.4.0 distro-info==0.23+ubuntu1.1 dm-tree==0.1.8 dtw==1.4.0 entrypoints==0.3 et-xmlfile==1.0.1 etils==1.3.0 executing==2.0.1 fail2ban==0.11.1 fastjsonschema==2.20.0 filelock==3.13.1 fiona==1.9.6 flatbuffers==24.3.25 folium==0.18.0 fonttools==4.53.1 frites==0.4.4 fsspec==2023.12.2 ftfy==6.2.0 gast==0.4.0 geodatasets==2024.8.0 geographiclib==2.0 geopandas==0.13.2 geopy==2.4.1 google-auth==2.36.0 google-auth-oauthlib==1.0.0 google-pasta==0.2.0 googleapis-common-protos==1.66.0 graphviz==0.20.3 grpcio==1.68.1 h5netcdf==1.1.0 h5py==3.11.0 hdbscan==0.8.40 html5lib==1.0.1 httplib2==0.14.0 huggingface-hub==0.22.2 hyperlink==19.0.0 idna==2.8 import-ipynb==0.2 importlib-metadata==8.5.0 importlib-resources==6.4.0 incremental==16.10.1 ipykernel==6.29.5 ipynb==0.5.1 ipynbname==2024.1.0.0 ipython==8.12.3 jdcal==1.0 jedi==0.19.1 Jinja2==2.10.1 joblib==1.4.2 jsonpatch==1.22 jsonpointer==2.0 jsonschema==3.2.0 jupyter-client==8.6.3 jupyter-core==5.7.2 keras==2.13.1 keyring==18.0.1 kiwisolver==1.3.1 language-selector==0.1 launchpadlib==1.10.13 lazr.restfulclient==0.14.2 lazr.uri==1.0.3 lazy-loader==0.4 libclang==18.1.1 libpysal==4.8.1 lxml==4.5.0 macaroonbakery==1.3.1 Markdown==3.7 MarkupSafe==2.1.5 matplotlib==3.7.5 matplotlib-inline==0.1.7 mercantile==1.2.1 mne==1.6.1 more-itertools==4.2.0 mpmath==1.3.0 nbformat==5.10.4 nest-asyncio==1.6.0 netCDF4==1.7.2 netifaces==0.10.4 networkx==3.1 nltk==3.9.1 numexpr==2.7.1 numpy==1.24.4 nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu12==12.1.0.106 nvidia-nccl-cu12==2.18.1 nvidia-nvjitlink-cu12==12.3.101 nvidia-nvtx-cu12==12.1.105 oauthlib==3.1.0 olefile==0.46 open-clip-torch==2.24.0 openpyxl==3.0.3 opt-einsum==3.3.0 packaging==23.2 pandas==2.0.3 parso==0.8.4 patsy==0.5.4 pexpect==4.6.0 pgmpy==0.1.24 pickleshare==0.7.5 Pillow==8.2.0 platformdirs==4.3.6 plotly==5.24.1 pluggy==0.13.0 pooch==1.8.2 promise==2.3 prompt-toolkit==3.0.47 property-cached==1.6.4 protobuf==4.25.5 psutil==6.0.0 pure-eval==0.2.2 py==1.8.1 py4j==0.10.9.7 pyasn1==0.4.2 pyasn1-modules==0.2.1 pycairo==1.16.2 pyclust==0.2.0 pycups==1.9.73 pycwt==0.4.0b0 pydot==2.0.0 pyerfa==2.0.0.3 pygments==2.18.0 PyGObject==3.36.0 PyHamcrest==1.9.0 pyinotify==0.9.6 PyJWT==1.7.1 pymacaroons==0.13.0 PyNaCl==1.3.0 pyOpenSSL==19.0.0 pyparsing==3.1.4 pyproj==3.5.0 pyRFC3339==1.1 pyrsistent==0.15.5 pyserial==3.4 pyspark==3.4.1 pytest==4.6.9 python-apt==2.0.1+ubuntu0.20.4.1 python-dateutil==2.8.2 python-debian==0.1.36+ubuntu1.1 pytz==2023.3.post1 PyYAML==5.3.1 pyzmq==26.2.0 rasterio==1.3.10 regex==2024.4.28 requests==2.22.0 requests-oauthlib==2.0.0 requests-unixsocket==0.2.0 rpt==2.0.0 rsa==4.9 ruptures==1.1.9 safetensors==0.4.3 scikit-learn==1.3.2 scipy==1.10.1 seaborn==0.13.2 SecretStorage==2.3.1 sentence-transformers==3.1.1 sentencepiece==0.2.0 service-identity==18.1.0 shapely==2.0.6 simplejson==3.16.0 six==1.14.0 snuggs==1.4.7 sos==4.7.2 soupsieve==1.9.5 spark-nlp==3.4.1 ssh-import-id==5.10 stack-data==0.6.3 statsmodels==0.14.1 summarytools==0.3.0 sympy==1.12 systemd-python==234 tables==3.6.1 tabulate==0.9.0 tenacity==9.0.0 tensorboard==2.13.0 tensorboard-data-server==0.7.2 tensorflow==2.13.1 tensorflow-datasets==4.9.2 tensorflow-estimator==2.13.0 tensorflow-io-gcs-filesystem==0.34.0 tensorflow-metadata==1.14.0 termcolor==2.4.0 threadpoolctl==3.1.0 timm==0.9.16 tokenizers==0.19.1 toml==0.10.2 torch==2.1.2 torchvision==0.18.0 tornado==6.4.1 tqdm==4.66.1 traitlets==5.14.3 transformers==4.40.2 treelib==1.7.0 triton==2.1.0 Twisted==18.9.0 typing-extensions==4.5.0 tzdata==2023.3 ubuntu-pro-client==8001 ufw==0.36 unattended-upgrades==0.1 urllib3==1.25.8 visualkeras==0.1.4 wadllib==1.3.3 wcwidth==0.2.13 webencodings==0.5.1 werkzeug==3.0.6 wrapt==1.17.0 xarray==2023.1.0 xlrd==1.1.0 xlwt==1.3.0 xyzservices==2024.6.0 zipp==3.20.2 zope.interface==4.7.1
# there is somewhere a PATH-error on LENA for a while
# adding my packages path to the PATH environment
import sys
sys.path.append("/home/smohr001/.local/lib/python3.8/site-packages")
sys.path
['/home/smohr001/thesis', '/usr/lib/python38.zip', '/usr/lib/python3.8', '/usr/lib/python3.8/lib-dynload', '', '/opt/jupyterhub/lib/python3.8/site-packages', '/opt/jupyterhub/lib/python3.8/site-packages/IPython/extensions', '/home/smohr001/.ipython', '/home/smohr001/.local/lib/python3.8/site-packages', '/home/smohr001/.local/lib/python3.8/site-packages', '/home/smohr001/.local/lib/python3.8/site-packages']
# importing standard libraries
import sys
import os
import warnings
import datetime
import time
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import random
import logging
from IPython.display import display, Javascript
# importing shared procedures for this procect (needs to be a simple .py file)
%run shared_procedures.py
# importing additional libraries
# get info about installed and used versions of some important (deep learning) libraries
print("Some important installed libraries:\n")
print(f"Pandas version: {pd.__version__}")
print(f"Numpy version: {np.__version__}")
print(f"Seaborn version: {sns.__version__}")
Some important installed libraries: Pandas version: 1.4.1 Numpy version: 1.22.2 Seaborn version: 0.13.2
<Figure size 432x288 with 0 Axes>
# import the parameters (foreced to reload)
import parameters
import importlib
importlib.reload(parameters)
# # print all loaded parameters from the Python file
print_loaded_parameters(parameters)
Loaded Parameters: parameters.buffer_size_km = 1000 parameters.data_dir = data/scope/reclustered_1000 parameters.data_dir_raw = data parameters.data_ts_dir = data/scope/reclustered_1000 parameters.scope_id = reclustered_1000
# show all matplotlib graphs inline
%matplotlib inline
# setting format to JPG for easy copy & paste for figures
# for high quality outputs choose 'svg'
%config InlineBackend.figure_format = 'jpg'
# adjust display settings to show 20 rows as a standard
pd.set_option('display.max_rows', 20)
# disable warnings (activate after developing the complete code)
warnings.filterwarnings('ignore')
# set script (ipynb notebook) name (e.g. for logging)
script_name = "50_timeseries_analysis.ipynb"
# start parameterized logging
setup_logging(logfile_dir = "log",
logfile_name = "50_analysis.log",
log_level = logging.INFO,
script_name = script_name
)
# set data directory
if(parameters.data_dir_raw):
data_dir_raw = parameters.data_dir_raw
else:
data_dir_raw = "data"
logging.info(f"{script_name}: Set raw data directory to './{data_dir_raw}'.")
if(parameters.data_dir):
data_dir = parameters.data_dir
else:
data_dir = "data"
logging.info(f"{script_name}: Set scope data directory to './{data_dir}'.")
if(parameters.data_ts_dir):
data_ts_dir = parameters.data_ts_dir
else:
data_ts_dir = "data/timeseries"
logging.info(f"{script_name}: Set scope timeseries data directory to './{data_ts_dir}'.")
2025-01-27 19:15:47,721 - INFO - Starting script '50_timeseries_analysis.ipynb'. 2025-01-27 19:15:47,723 - INFO - Set loglevel to INFO. 2025-01-27 19:15:47,723 - INFO - 50_timeseries_analysis.ipynb: Set raw data directory to './data'. 2025-01-27 19:15:47,724 - INFO - 50_timeseries_analysis.ipynb: Set scope data directory to './data/scope/reclustered_1000'. 2025-01-27 19:15:47,725 - INFO - 50_timeseries_analysis.ipynb: Set scope timeseries data directory to './data/scope/reclustered_1000'.
# set fixed seed for reproducibility
reset_random_seeds(script_name=script_name)
2025-01-27 19:15:47,729 - INFO - 50_timeseries_analysis.ipynb: Seeding random generators with seed=654321.
# set individual clusterdir for saving
selected_subdir = choose_subdirectory(data_dir)
data_dir_cluster = os.path.join(data_dir, selected_subdir)
data_ts_dir_cluster = os.path.join(data_ts_dir, selected_subdir)
logging.info(f"{script_name}: Set cluster data directory to './{data_dir_cluster}'.")
logging.info(f"{script_name}: Set cluster timeseries data directory to './{data_ts_dir_cluster}'.")
# set all timeseries names for saving and loading
ts_names_load = ["eq_ts_daily_multi", "eq_ts_weekly_multi", "eq_ts_monthly_multi", "eq_ts_yearly_multi",
"erupt_ts_daily_multi", "erupt_ts_weekly_multi", "erupt_ts_monthly_multi", "erupt_ts_yearly_multi"]
# iterate over the timeseries names and load them into global objects
for name in ts_names_load:
ts = load_timeseries(
data_file=f"{name}.csv",
data_dir=data_ts_dir_cluster
)
globals()[name] = ts
# iterate over loaded timeseries and set freq and index correctly
for name in ts_names_load:
ts = eval(name)
inferred_freq = pd.infer_freq(ts.index)
ts.index.freq = inferred_freq
ts.index = pd.to_datetime(ts.index)
logging.info(f"Assigned and formatted timeseries ({inferred_freq}) for {name}.")
Available subdirectories: 1: cluster_Cluster_4 2: cluster_Cluster_3 3: cluster_Cluster_0 4: cluster_Cluster_5 5: cluster_Cluster_6 6: cluster_Cluster_1 7: cluster_Cluster_2 Enter the number of the subdirectory you want to choose: 3
2025-01-27 19:15:53,194 - INFO - 50_timeseries_analysis.ipynb: Set cluster data directory to './data/scope/reclustered_1000/cluster_Cluster_0'. 2025-01-27 19:15:53,196 - INFO - 50_timeseries_analysis.ipynb: Set cluster timeseries data directory to './data/scope/reclustered_1000/cluster_Cluster_0'. 2025-01-27 19:15:53,471 - INFO - load_timeseries: Timeseries assigned from 'data/scope/reclustered_1000/cluster_Cluster_0/eq_ts_daily_multi.csv'. 2025-01-27 19:15:53,498 - INFO - load_timeseries: Timeseries assigned from 'data/scope/reclustered_1000/cluster_Cluster_0/eq_ts_weekly_multi.csv'. 2025-01-27 19:15:53,567 - INFO - load_timeseries: Timeseries assigned from 'data/scope/reclustered_1000/cluster_Cluster_0/eq_ts_monthly_multi.csv'. 2025-01-27 19:15:53,627 - INFO - load_timeseries: Timeseries assigned from 'data/scope/reclustered_1000/cluster_Cluster_0/eq_ts_yearly_multi.csv'. 2025-01-27 19:15:53,719 - INFO - load_timeseries: Timeseries assigned from 'data/scope/reclustered_1000/cluster_Cluster_0/erupt_ts_daily_multi.csv'. 2025-01-27 19:15:53,778 - INFO - load_timeseries: Timeseries assigned from 'data/scope/reclustered_1000/cluster_Cluster_0/erupt_ts_weekly_multi.csv'. 2025-01-27 19:15:53,843 - INFO - load_timeseries: Timeseries assigned from 'data/scope/reclustered_1000/cluster_Cluster_0/erupt_ts_monthly_multi.csv'. 2025-01-27 19:15:53,912 - INFO - load_timeseries: Timeseries assigned from 'data/scope/reclustered_1000/cluster_Cluster_0/erupt_ts_yearly_multi.csv'. 2025-01-27 19:15:53,931 - INFO - Assigned and formatted timeseries (D) for eq_ts_daily_multi. 2025-01-27 19:15:53,935 - INFO - Assigned and formatted timeseries (W-SUN) for eq_ts_weekly_multi. 2025-01-27 19:15:53,938 - INFO - Assigned and formatted timeseries (M) for eq_ts_monthly_multi. 2025-01-27 19:15:53,940 - INFO - Assigned and formatted timeseries (A-DEC) for eq_ts_yearly_multi. 2025-01-27 19:15:53,951 - INFO - Assigned and formatted timeseries (D) for erupt_ts_daily_multi. 2025-01-27 19:15:53,953 - INFO - Assigned and formatted timeseries (W-SUN) for erupt_ts_weekly_multi. 2025-01-27 19:15:53,956 - INFO - Assigned and formatted timeseries (M) for erupt_ts_monthly_multi. 2025-01-27 19:15:53,958 - INFO - Assigned and formatted timeseries (A-DEC) for erupt_ts_yearly_multi.
# show basic timeseries summaries
for name in ts_names_load:
ts = eval(name)
print(f"===== Timeseries summary for {name}:=====")
print(f"{summarize_time_series(ts)}, {type(ts)}, {ts.columns}\n")
===== Timeseries summary for eq_ts_daily_multi:=====
{'number_of_entries': 18262, 'first_date': Timestamp('1970-01-01 00:00:00'), 'last_date': Timestamp('2019-12-31 00:00:00'), 'ts_type': <class 'pandas.core.frame.DataFrame'>, 'ts_freq_index': 'D', 'ts_freq_inferred': 'D'}, <class 'pandas.core.frame.DataFrame'>, Index(['count', 'mag_max', 'energy_sum', 'energy_max', 'tsunami_sum',
'felt_mean', 'cdi_mean', 'nst_mean', 'mmi_mean'],
dtype='object')
===== Timeseries summary for eq_ts_weekly_multi:=====
{'number_of_entries': 2610, 'first_date': Timestamp('1970-01-04 00:00:00'), 'last_date': Timestamp('2020-01-05 00:00:00'), 'ts_type': <class 'pandas.core.frame.DataFrame'>, 'ts_freq_index': 'W-SUN', 'ts_freq_inferred': 'W-SUN'}, <class 'pandas.core.frame.DataFrame'>, Index(['count', 'mag_max', 'energy_sum', 'energy_max', 'tsunami_sum',
'felt_mean', 'cdi_mean', 'nst_mean', 'mmi_mean'],
dtype='object')
===== Timeseries summary for eq_ts_monthly_multi:=====
{'number_of_entries': 600, 'first_date': Timestamp('1970-01-31 00:00:00'), 'last_date': Timestamp('2019-12-31 00:00:00'), 'ts_type': <class 'pandas.core.frame.DataFrame'>, 'ts_freq_index': 'M', 'ts_freq_inferred': 'M'}, <class 'pandas.core.frame.DataFrame'>, Index(['count', 'mag_max', 'energy_sum', 'energy_max', 'tsunami_sum',
'felt_mean', 'cdi_mean', 'nst_mean', 'mmi_mean'],
dtype='object')
===== Timeseries summary for eq_ts_yearly_multi:=====
{'number_of_entries': 50, 'first_date': Timestamp('1970-12-31 00:00:00'), 'last_date': Timestamp('2019-12-31 00:00:00'), 'ts_type': <class 'pandas.core.frame.DataFrame'>, 'ts_freq_index': 'A-DEC', 'ts_freq_inferred': 'A-DEC'}, <class 'pandas.core.frame.DataFrame'>, Index(['count', 'mag_max', 'energy_sum', 'energy_max', 'tsunami_sum',
'felt_mean', 'cdi_mean', 'nst_mean', 'mmi_mean'],
dtype='object')
===== Timeseries summary for erupt_ts_daily_multi:=====
{'number_of_entries': 18262, 'first_date': Timestamp('1970-01-01 00:00:00'), 'last_date': Timestamp('2019-12-31 00:00:00'), 'ts_type': <class 'pandas.core.frame.DataFrame'>, 'ts_freq_index': 'D', 'ts_freq_inferred': 'D'}, <class 'pandas.core.frame.DataFrame'>, Index(['StartingEruptions_count', 'ExplosivityIndexMax_max',
'DailyEruptedVolume_sum', 'ActiveEruptions'],
dtype='object')
===== Timeseries summary for erupt_ts_weekly_multi:=====
{'number_of_entries': 2610, 'first_date': Timestamp('1970-01-04 00:00:00'), 'last_date': Timestamp('2020-01-05 00:00:00'), 'ts_type': <class 'pandas.core.frame.DataFrame'>, 'ts_freq_index': 'W-SUN', 'ts_freq_inferred': 'W-SUN'}, <class 'pandas.core.frame.DataFrame'>, Index(['StartingEruptions_count', 'ActiveEruptions', 'ExplosivityIndexMax_max',
'DailyEruptedVolume_sum'],
dtype='object')
===== Timeseries summary for erupt_ts_monthly_multi:=====
{'number_of_entries': 600, 'first_date': Timestamp('1970-01-31 00:00:00'), 'last_date': Timestamp('2019-12-31 00:00:00'), 'ts_type': <class 'pandas.core.frame.DataFrame'>, 'ts_freq_index': 'M', 'ts_freq_inferred': 'M'}, <class 'pandas.core.frame.DataFrame'>, Index(['StartingEruptions_count', 'ActiveEruptions', 'ExplosivityIndexMax_max',
'DailyEruptedVolume_sum'],
dtype='object')
===== Timeseries summary for erupt_ts_yearly_multi:=====
{'number_of_entries': 50, 'first_date': Timestamp('1970-12-31 00:00:00'), 'last_date': Timestamp('2019-12-31 00:00:00'), 'ts_type': <class 'pandas.core.frame.DataFrame'>, 'ts_freq_index': 'A-DEC', 'ts_freq_inferred': 'A-DEC'}, <class 'pandas.core.frame.DataFrame'>, Index(['StartingEruptions_count', 'ActiveEruptions', 'ExplosivityIndexMax_max',
'DailyEruptedVolume_sum'],
dtype='object')
# defining the daily plots
plot_definitions = [
# earthquakes daily
['eq_ts_daily_multi', 'count', False, False, 1, 'Earthquakes dataset'],
['eq_ts_daily_multi', 'energy_max', False, True, 1, 'Earthquakes dataset'],
['eq_ts_daily_multi', 'energy_sum', False, True, 1, 'Earthquakes dataset'],
# eruptions daily
['erupt_ts_daily_multi', 'StartingEruptions_count', False, False, 1, 'Eruptions dataset'],
['erupt_ts_daily_multi', 'ActiveEruptions', True, False, 1, 'Eruptions dataset'],
['erupt_ts_daily_multi', 'ExplosivityIndexMax_max', False, False, 1, 'Eruptions dataset'],
['erupt_ts_daily_multi', 'DailyEruptedVolume_sum', False, True, 1, 'Eruptions dataset'],
]
# loop through plot definitions and generate plots
for definition in plot_definitions:
ts_name, feature_name, plot_lines, logarithmic, dot_size, topic = definition
ts = eval(ts_name)[feature_name]
plot_timeseries(ts=ts,
feature_name=feature_name,
plot_lines=plot_lines,
logarithmic=logarithmic,
dot_size=dot_size,
figsize=(14, 3),
topic=topic
)
# defining the weekly plots
plot_definitions = [
# earthquakes weekly
['eq_ts_weekly_multi', 'count', False, False, 1, 'Earthquakes dataset'],
['eq_ts_weekly_multi', 'energy_max', False, True, 1, 'Earthquakes dataset'],
['eq_ts_weekly_multi', 'energy_sum', False, True, 1, 'Earthquakes dataset'],
# eruptions weekly
['erupt_ts_weekly_multi', 'StartingEruptions_count', False, False, 1, 'Eruptions dataset'],
['erupt_ts_weekly_multi', 'ActiveEruptions', True, False, 1, 'Eruptions dataset'],
['erupt_ts_weekly_multi', 'ExplosivityIndexMax_max', False, False, 1, 'Eruptions dataset'],
['erupt_ts_weekly_multi', 'DailyEruptedVolume_sum', False, True, 1, 'Eruptions dataset'],
]
# loop through plot definitions and generate plots
for definition in plot_definitions:
ts_name, feature_name, plot_lines, logarithmic, dot_size, topic = definition
ts = eval(ts_name)[feature_name]
plot_timeseries(ts=ts,
feature_name=feature_name,
plot_lines=plot_lines,
logarithmic=logarithmic,
dot_size=dot_size,
figsize=(14, 3),
topic=topic
)
# defining the monthly plots
plot_definitions = [
# earthquakes monthly
['eq_ts_monthly_multi', 'count', True, False, 1, 'Earthquakes dataset'],
['eq_ts_monthly_multi', 'energy_max', True, True, 1, 'Earthquakes dataset'],
['eq_ts_monthly_multi', 'energy_sum', True, True, 1, 'Earthquakes dataset'],
# eruptions
['erupt_ts_monthly_multi', 'StartingEruptions_count', False, False, 1, 'Eruptions dataset'],
['erupt_ts_monthly_multi', 'ActiveEruptions', True, False, 1, 'Eruptions dataset'],
['erupt_ts_monthly_multi', 'ExplosivityIndexMax_max', False, False, 1, 'Eruptions dataset'],
['erupt_ts_monthly_multi', 'DailyEruptedVolume_sum', False, True, 1, 'Eruptions dataset'],
]
# loop through plot definitions and generate plots
for definition in plot_definitions:
ts_name, feature_name, plot_lines, logarithmic, dot_size, topic = definition
ts = eval(ts_name)[feature_name]
plot_timeseries(ts=ts,
feature_name=feature_name,
plot_lines=plot_lines,
logarithmic=logarithmic,
dot_size=dot_size,
figsize=(14, 3),
topic=topic
)
# defining the yearly plots
plot_definitions = [
# earthquakes yearly
['eq_ts_yearly_multi', 'count', True, False, 15, 'Earthquakes dataset'],
['eq_ts_yearly_multi', 'energy_max', True, True, 15, 'Earthquakes dataset'],
['eq_ts_yearly_multi', 'energy_sum', True, True, 15, 'Earthquakes dataset'],
# eruptions yearly
['erupt_ts_yearly_multi', 'StartingEruptions_count', True, False, 15, 'Eruptions dataset'],
['erupt_ts_yearly_multi', 'ActiveEruptions', True, False, 15, 'Eruptions dataset'],
['erupt_ts_yearly_multi', 'ExplosivityIndexMax_max', True, False, 15, 'Eruptions dataset'],
['erupt_ts_yearly_multi', 'DailyEruptedVolume_sum', True, True, 15, 'Eruptions dataset'],
]
# loop through plot definitions and generate plots
for definition in plot_definitions:
ts_name, feature_name, plot_lines, logarithmic, dot_size, topic = definition
ts = eval(ts_name)[feature_name]
plot_timeseries(ts=ts,
feature_name=feature_name,
plot_lines=plot_lines,
logarithmic=logarithmic,
dot_size=dot_size,
figsize=(14, 3),
topic=topic
)
# get the imputed timeseries
ts1_data = pd.DataFrame(impute_timeseries(ts=eq_ts_yearly_multi['count'], identification='Earthquakes Count')).values
# Timeseries Rupture Plot
max_pen = 2
for pen in range(1, max_pen+1):
print(f"Timeseries Rupture Plot, pen={pen}")
algo = rpt.Pelt(model="rbf").fit(ts1_data)
result = algo.predict(pen=pen)
print(result)
rpt.display(ts1_data, result)
if(pen == max_pen):
figurename = (f"50_TRP_earthquakes_count_pen({pen})_result{result}.png").replace("[", "_").replace("]", "_").replace(",", "_")
plt.savefig(os.path.join(data_ts_dir_cluster, figurename), format='png', dpi=150, bbox_inches='tight')
logging.info(f"{script_name}: {figurename} to './{data_ts_dir_cluster}'.")
plt.show()
No NaN found in (Earthquakes Count)! Timeseries Rupture Plot, pen=1 [5, 25, 50]
2025-01-27 19:15:58,441 - INFO - 50_timeseries_analysis.ipynb: 50_TRP_earthquakes_count_pen(2)_result_5_ 50_.png to './data/scope/reclustered_1000/cluster_Cluster_0'.
Timeseries Rupture Plot, pen=2 [5, 50]
# get the imputed timeseries
ts1_data = pd.DataFrame(impute_timeseries(ts=eq_ts_yearly_multi['energy_sum'], identification='Released earthquake energy')).values
# Timeseries Rupture Plot
max_pen = 2
for pen in range(1, max_pen+1):
print(f"Timeseries Rupture Plot, pen={pen}")
algo = rpt.Pelt(model="rbf").fit(ts1_data)
result = algo.predict(pen=pen)
print(result)
rpt.display(ts1_data, result)
if(pen == max_pen):
figurename = (f"50_TRP_earthquakes_energy_pen({pen})_result{result}.png").replace("[", "_").replace("]", "_").replace(",", "_")
plt.savefig(os.path.join(data_ts_dir_cluster, figurename), format='png', dpi=150, bbox_inches='tight')
logging.info(f"{script_name}: {figurename} to './{data_ts_dir_cluster}'.")
plt.show()
No NaN found in (Released earthquake energy)! Timeseries Rupture Plot, pen=1 [50]
Timeseries Rupture Plot, pen=2 [50]
2025-01-27 19:15:58,768 - INFO - 50_timeseries_analysis.ipynb: 50_TRP_earthquakes_energy_pen(2)_result_50_.png to './data/scope/reclustered_1000/cluster_Cluster_0'.
# get the imputed timeseries
ts1_data = pd.DataFrame(impute_timeseries(ts=eq_ts_yearly_multi['mag_max'], identification='Max earthquake magnitude')).values
# Timeseries Rupture Plot
max_pen = 2
for pen in range(1, max_pen+1):
print(f"Timeseries Rupture Plot, pen={pen}")
algo = rpt.Pelt(model="rbf").fit(ts1_data)
result = algo.predict(pen=pen)
print(result)
rpt.display(ts1_data, result)
if(pen == max_pen):
figurename = (f"50_TRP_earthquakes_max_energy_pen({pen})_result{result}.png").replace("[", "_").replace("]", "_").replace(",", "_")
plt.savefig(os.path.join(data_ts_dir_cluster, figurename), format='png', dpi=150, bbox_inches='tight')
logging.info(f"{script_name}: {figurename} to './{data_ts_dir_cluster}'.")
plt.show()
No NaN found in (Max earthquake magnitude)! Timeseries Rupture Plot, pen=1 [50]
2025-01-27 19:15:58,975 - INFO - 50_timeseries_analysis.ipynb: 50_TRP_earthquakes_max_energy_pen(2)_result_50_.png to './data/scope/reclustered_1000/cluster_Cluster_0'.
Timeseries Rupture Plot, pen=2 [50]
# get the imputed timeseries
ts1_data = pd.DataFrame(impute_timeseries(ts=erupt_ts_yearly_multi['StartingEruptions_count'], identification='Starting eruptions')).values
# Timeseries Rupture Plot
max_pen = 2
for pen in range(1, max_pen+1):
print(f"Timeseries Rupture Plot, pen={pen}")
algo = rpt.Pelt(model="rbf").fit(ts1_data)
result = algo.predict(pen=pen)
print(result)
rpt.display(ts1_data, result)
if(pen == max_pen):
figurename = (f"50_TRP_earthquakes_StartingEruptions_pen({pen})_result{result}.png").replace("[", "_").replace("]", "_").replace(",", "_")
plt.savefig(os.path.join(data_ts_dir_cluster, figurename), format='png', dpi=150, bbox_inches='tight')
logging.info(f"{script_name}: {figurename} to './{data_ts_dir_cluster}'.")
plt.show()
No NaN found in (Starting eruptions)! Timeseries Rupture Plot, pen=1 [50]
2025-01-27 19:15:59,188 - INFO - 50_timeseries_analysis.ipynb: 50_TRP_earthquakes_StartingEruptions_pen(2)_result_50_.png to './data/scope/reclustered_1000/cluster_Cluster_0'.
Timeseries Rupture Plot, pen=2 [50]
# get the imputed timeseries
ts1_data = pd.DataFrame(impute_timeseries(ts=erupt_ts_yearly_multi['DailyEruptedVolume_sum'], identification='Erupted Volume')).values
# Timeseries Rupture Plot
max_pen = 2
for pen in range(1, max_pen+1):
print(f"Timeseries Rupture Plot, pen={pen}")
algo = rpt.Pelt(model="rbf").fit(ts1_data)
result = algo.predict(pen=pen)
print(result)
rpt.display(ts1_data, result)
if(pen == max_pen):
figurename = (f"50_TRP_earthquakes_EruptedVolume_pen({pen})_result{result}.png").replace("[", "_").replace("]", "_").replace(",", "_")
plt.savefig(os.path.join(data_ts_dir_cluster, figurename), format='png', dpi=150, bbox_inches='tight')
logging.info(f"{script_name}: {figurename} to './{data_ts_dir_cluster}'.")
plt.show()
No NaN found in (Erupted Volume)! Timeseries Rupture Plot, pen=1 [10, 15, 40, 50]
2025-01-27 19:15:59,405 - INFO - 50_timeseries_analysis.ipynb: 50_TRP_earthquakes_EruptedVolume_pen(2)_result_50_.png to './data/scope/reclustered_1000/cluster_Cluster_0'.
Timeseries Rupture Plot, pen=2 [50]
# get the imputed timeseries
ts1_data = pd.DataFrame(impute_timeseries(ts=erupt_ts_yearly_multi['ExplosivityIndexMax_max'], identification='Max VEI')).values
# Timeseries Rupture Plot
max_pen = 2
for pen in range(1, max_pen+1):
print(f"Timeseries Rupture Plot, pen={pen}")
algo = rpt.Pelt(model="rbf").fit(ts1_data)
result = algo.predict(pen=pen)
print(result)
rpt.display(ts1_data, result)
if(pen == max_pen):
figurename = (f"50_TRP_earthquakes_max_ExplosivityIndexMax_pen({pen})_result{result}.png").replace("[", "_").replace("]", "_").replace(",", "_")
plt.savefig(os.path.join(data_ts_dir_cluster, figurename), format='png', dpi=150, bbox_inches='tight')
logging.info(f"{script_name}: {figurename} to './{data_ts_dir_cluster}'.")
plt.show()
No NaN found in (Max VEI)! Timeseries Rupture Plot, pen=1 [30, 35, 40, 50]
2025-01-27 19:15:59,615 - INFO - 50_timeseries_analysis.ipynb: 50_TRP_earthquakes_max_ExplosivityIndexMax_pen(2)_result_50_.png to './data/scope/reclustered_1000/cluster_Cluster_0'.
Timeseries Rupture Plot, pen=2 [50]
# ACF, PACF daily plots
lags = 2*31
ts1_data = pd.DataFrame(impute_timeseries(ts=eq_ts_daily_multi['count'], identification='Earthquakes, Count'))
plot_acf_pacf_series(ts1_data, feature_name="count", lags=lags, title="Earthquakes, Count", xlabel="Days")
ts1_data = pd.DataFrame(impute_timeseries(ts=eq_ts_daily_multi['energy_sum'], identification='Earthquakes, Released Energy'))
plot_acf_pacf_series(ts1_data, feature_name="energy_sum", lags=lags, title="Earthquakes, Released Energy", xlabel="Days")
ts1_data = pd.DataFrame(impute_timeseries(ts=eq_ts_daily_multi['mag_max'], identification='Earthquakes, max Magnitude'))
plot_acf_pacf_series(ts1_data, feature_name="mag_max", lags=lags, title="Earthquakes, max Magnitude", xlabel="Days")
ts1_data = pd.DataFrame(impute_timeseries(ts=erupt_ts_daily_multi['StartingEruptions_count'], identification='Eruptions, Count'))
plot_acf_pacf_series(ts1_data, feature_name="StartingEruptions_count", lags=lags, title="Eruptions, Count", xlabel="Days")
ts1_data = pd.DataFrame(impute_timeseries(ts=erupt_ts_daily_multi['DailyEruptedVolume_sum'], identification='Eruptions, Erupted Volume'))
plot_acf_pacf_series(ts1_data, feature_name="DailyEruptedVolume_sum", lags=lags, title="Eruptions, Erupted Volume", xlabel="Days")
ts1_data = pd.DataFrame(impute_timeseries(ts=erupt_ts_daily_multi['ExplosivityIndexMax_max'], identification='Eruptions, Max VEI'))
plot_acf_pacf_series(ts1_data, feature_name="ExplosivityIndexMax_max", lags=lags, title="Eruptions, Max VEI", xlabel="Days")
Number of NaN 10646, imputing (Earthquakes, Count) with mean value!
Number of NaN 10646, imputing (Earthquakes, Released Energy) with mean value!
Number of NaN 10646, imputing (Earthquakes, max Magnitude) with mean value!
Number of NaN 18047, imputing (Eruptions, Count) with mean value!
Number of NaN 18047, imputing (Eruptions, Erupted Volume) with mean value!
Number of NaN 18047, imputing (Eruptions, Max VEI) with mean value!
# ACF, PACF weekly plots
lags = 52
ts1_data = pd.DataFrame(impute_timeseries(ts=eq_ts_weekly_multi['count'], identification='Earthquakes, Count'))
plot_acf_pacf_series(ts1_data, feature_name="count", lags=lags, title="Earthquakes, Count", xlabel="Weeks")
ts1_data = pd.DataFrame(impute_timeseries(ts=eq_ts_weekly_multi['energy_sum'], identification='Earthquakes, Released Energy'))
plot_acf_pacf_series(ts1_data, feature_name="energy_sum", lags=lags, title="Earthquakes, Released Energy", xlabel="Weeks")
ts1_data = pd.DataFrame(impute_timeseries(ts=eq_ts_weekly_multi['mag_max'], identification='Earthquakes, Max Magnitude'))
plot_acf_pacf_series(ts1_data, feature_name="mag_max", lags=lags, title="Earthquakes, Max Magnitude", xlabel="Weeks")
ts1_data = pd.DataFrame(impute_timeseries(ts=erupt_ts_weekly_multi['StartingEruptions_count'], identification='Eruptions, Count'))
plot_acf_pacf_series(ts1_data, feature_name="StartingEruptions_count", lags=lags, title="Eruptions, Count", xlabel="Weeks")
ts1_data = pd.DataFrame(impute_timeseries(ts=erupt_ts_weekly_multi['DailyEruptedVolume_sum'], identification='Eruptions, Erupted Volume'))
plot_acf_pacf_series(ts1_data, feature_name="DailyEruptedVolume_sum", lags=lags, title="Eruptions, Erupted Volume", xlabel="Weeks")
ts1_data = pd.DataFrame(impute_timeseries(ts=erupt_ts_weekly_multi['ExplosivityIndexMax_max'], identification='Eruptions, max VEI'))
plot_acf_pacf_series(ts1_data, feature_name="ExplosivityIndexMax_max", lags=lags, title="Eruptions, max VEI", xlabel="Weeks")
No NaN found in (Earthquakes, Count)!
No NaN found in (Earthquakes, Released Energy)!
Number of NaN 124, imputing (Earthquakes, Max Magnitude) with mean value!
Number of NaN 2401, imputing (Eruptions, Count) with mean value!
No NaN found in (Eruptions, Erupted Volume)!
Number of NaN 2401, imputing (Eruptions, max VEI) with mean value!
# ACF, PACF monthly plots
lags = 60
ts1_data = pd.DataFrame(impute_timeseries(ts=eq_ts_monthly_multi['count'], identification='Earthquakes, Count'))
plot_acf_pacf_series(ts1_data, feature_name="count", lags=lags, title="Earthquakes, Count", xlabel="Months")
ts1_data = pd.DataFrame(impute_timeseries(ts=eq_ts_monthly_multi['energy_sum'], identification='Earthquakes, Released Energy'))
plot_acf_pacf_series(ts1_data, feature_name="energy_sum", lags=lags, title="Earthquakes, Released Energy", xlabel="Months")
ts1_data = pd.DataFrame(impute_timeseries(ts=eq_ts_monthly_multi['energy_max'], identification='Earthquakes, max Released Energy'))
plot_acf_pacf_series(ts1_data, feature_name="energy_max", lags=lags, title="Earthquakes, max Released Energy", xlabel="Months")
ts1_data = pd.DataFrame(impute_timeseries(ts=erupt_ts_monthly_multi['StartingEruptions_count'], identification='Eruptions, Count'))
plot_acf_pacf_series(ts1_data, feature_name="StartingEruptions_count", lags=lags, title="Eruptions, Count", xlabel="Months")
ts1_data = pd.DataFrame(impute_timeseries(ts=erupt_ts_monthly_multi['DailyEruptedVolume_sum'], identification='Eruptions, Erupted Volume'))
plot_acf_pacf_series(ts1_data, feature_name="DailyEruptedVolume_sum", lags=lags, title="Eruptions, Erupted Volume", xlabel="Months")
ts1_data = pd.DataFrame(impute_timeseries(ts=erupt_ts_monthly_multi['ExplosivityIndexMax_max'], identification='Eruptions, max VEI'))
plot_acf_pacf_series(ts1_data, feature_name="ExplosivityIndexMax_max", lags=lags, title="Eruptions, max VEI", xlabel="Months")
No NaN found in (Earthquakes, Count)!
No NaN found in (Earthquakes, Released Energy)!
No NaN found in (Earthquakes, max Released Energy)!
Number of NaN 412, imputing (Eruptions, Count) with mean value!
No NaN found in (Eruptions, Erupted Volume)!
Number of NaN 412, imputing (Eruptions, max VEI) with mean value!
# # ACF, PACF yearly plots
# only 50% of the sample size is possible to calculate for PACF
lags_pacf = 25
ts1_data = pd.DataFrame(impute_timeseries(ts=eq_ts_yearly_multi['count'], identification='Earthquakes, Count'))
plot_acf_pacf_series(ts1_data, feature_name="count", lags=lags_pacf, title="Earthquakes, Count", xlabel="Years")
ts1_data = pd.DataFrame(impute_timeseries(ts=eq_ts_yearly_multi['energy_sum'], identification='Earthquakes, Released Energy'))
plot_acf_pacf_series(ts1_data, feature_name="energy_sum", lags=lags_pacf, title="Earthquakes, Released Energy", xlabel="Years")
ts1_data = pd.DataFrame(impute_timeseries(ts=eq_ts_yearly_multi['energy_max'], identification='Earthquakes, max Released Energy'))
plot_acf_pacf_series(ts1_data, feature_name="energy_max", lags=lags_pacf, title="Earthquakes, max Released Energy", xlabel="Years")
ts1_data = pd.DataFrame(impute_timeseries(ts=erupt_ts_yearly_multi['StartingEruptions_count'], identification='Eruptions, Count'))
plot_acf_pacf_series(ts1_data, feature_name="StartingEruptions_count", lags=lags_pacf, title="Eruptions, Count", xlabel="Years")
ts1_data = pd.DataFrame(impute_timeseries(ts=erupt_ts_yearly_multi['DailyEruptedVolume_sum'], identification='Eruptions, Erupted Volume'))
plot_acf_pacf_series(ts1_data, feature_name="DailyEruptedVolume_sum", lags=lags_pacf, title="Eruptions, Erupted Volume", xlabel="Years")
ts1_data = pd.DataFrame(impute_timeseries(ts=erupt_ts_yearly_multi['ExplosivityIndexMax_max'], identification='Eruptions, max VEI'))
plot_acf_pacf_series(ts1_data, feature_name="ExplosivityIndexMax_max", lags=lags_pacf, title="Eruptions, max VEI", xlabel="Years")
No NaN found in (Earthquakes, Count)!
No NaN found in (Earthquakes, Released Energy)!
No NaN found in (Earthquakes, max Released Energy)!
No NaN found in (Eruptions, Count)!
No NaN found in (Eruptions, Erupted Volume)!
No NaN found in (Eruptions, max VEI)!
# FFT for Earthquakes Count (Yearly)
identification = 'Earthquakes Count (Yearly)'
ts1_data = impute_timeseries(ts=eq_ts_yearly_multi['count'], identification=identification)
analyze_and_plot_fft(ts_data=ts1_data, num_n=5, identification=identification)
# STFT
plot_stft(ts1_data.values, fs=1, window_size=64, noverlap=16, title=identification)
No NaN found in (Earthquakes Count (Yearly))! Most prominent frequencies (cycles/interval): freq = 0.280000, T = 4, amp = 1004 freq = 0.140000, T = 7, amp = 1163 freq = 0.120000, T = 8, amp = 1093 freq = 0.060000, T = 17, amp = 1401 freq = 0.040000, T = 25, amp = 1532
# FFT for Released Earthquake Energy (Yearly)
identification = 'Released Earthquake Energy (Yearly)'
ts1_data = impute_timeseries(ts=eq_ts_yearly_multi['energy_sum'], identification=identification)
analyze_and_plot_fft(ts_data=ts1_data, num_n=5, identification=identification)
# STFT
plot_stft(ts1_data.values, fs=1, window_size=64, noverlap=16, title=identification)
No NaN found in (Released Earthquake Energy (Yearly))! Most prominent frequencies (cycles/interval): freq = 0.280000, T = 4, amp = 3077775974115362816 freq = 0.260000, T = 4, amp = 3059636808393258496 freq = 0.180000, T = 6, amp = 3108705416015720448 freq = 0.120000, T = 8, amp = 3020072280566871040 freq = 0.100000, T = 10, amp = 3080296148317176832
# FFT for Eruption Count (Yearly)
identification = 'Eruptions Count (Yearly)'
ts1_data = impute_timeseries(ts=erupt_ts_yearly_multi['StartingEruptions_count'], identification=identification)
analyze_and_plot_fft(ts_data=ts1_data, num_n=5, identification=identification)
# STFT
plot_stft(ts1_data.values, fs=1, window_size=64, noverlap=16, title=identification)
No NaN found in (Eruptions Count (Yearly))! Most prominent frequencies (cycles/interval): freq = 0.400000, T = 2, amp = 14 freq = 0.300000, T = 3, amp = 16 freq = 0.220000, T = 5, amp = 21 freq = 0.200000, T = 5, amp = 18 freq = 0.080000, T = 12, amp = 17
# FFT for Erupted Volume (Yearly)
identification = 'Erupted Volume (Yearly)'
ts1_data = impute_timeseries(ts=erupt_ts_yearly_multi['DailyEruptedVolume_sum'], identification=identification)
analyze_and_plot_fft(ts_data=ts1_data, num_n=5, identification=identification)
# STFT
plot_stft(ts1_data.values, fs=1, window_size=128, noverlap=32, title=identification)
No NaN found in (Erupted Volume (Yearly))! Most prominent frequencies (cycles/interval): freq = 0.380000, T = 3, amp = 346431942 freq = 0.360000, T = 3, amp = 360976605 freq = 0.260000, T = 4, amp = 365475925 freq = 0.120000, T = 8, amp = 379668985 freq = 0.100000, T = 10, amp = 366712035
# FFT for Max Earthquake Magnitude (Yearly)
identification = 'Max Earthquake Magnitude (Yearly)'
ts1_data = impute_timeseries(ts=eq_ts_yearly_multi['mag_max'], identification=identification)
analyze_and_plot_fft(ts_data=ts1_data, num_n=5, identification=identification)
# STFT
plot_stft(ts1_data.values, fs=1, window_size=128, noverlap=32, title=identification)
No NaN found in (Max Earthquake Magnitude (Yearly))! Most prominent frequencies (cycles/interval): freq = 0.480000, T = 2, amp = 4 freq = 0.280000, T = 4, amp = 4 freq = 0.220000, T = 5, amp = 5 freq = 0.180000, T = 6, amp = 7 freq = 0.100000, T = 10, amp = 4
# FFT for Max VEI (Yearly)
identification = 'Max VEI (Yearly)'
ts1_data = impute_timeseries(ts=erupt_ts_yearly_multi['ExplosivityIndexMax_max'], identification=identification)
analyze_and_plot_fft(ts_data=ts1_data, num_n=5, identification=identification)
# STFT
plot_stft(ts1_data.values, fs=1, window_size=128, noverlap=32, title=identification)
No NaN found in (Max VEI (Yearly))! Most prominent frequencies (cycles/interval): freq = 0.400000, T = 2, amp = 10 freq = 0.380000, T = 3, amp = 8 freq = 0.260000, T = 4, amp = 8 freq = 0.100000, T = 10, amp = 8 freq = 0.080000, T = 12, amp = 9
# Seasonal decomposition of daily ActiveEruptions
# because this is obviously a timeseries with trend and seasonal components
# impute data
ts1_data = pd.DataFrame(impute_timeseries(ts=erupt_ts_daily_multi['ActiveEruptions'], identification='active eruptions'))
# decompose the time series (components of trend, seasonality, noise, risiduals)
period=8*365
result_seasonal = seasonal_decompose(ts1_data, model='additive', two_sided=True, period=period)
# plot the decomposition
fig, axes = plt.subplots(4, 1, figsize=(10, 8), sharex=True)
result_seasonal.observed.plot(ax=axes[0], title='Observed', color='darkblue', linewidth=0.5)
result_seasonal.trend.plot(ax=axes[1], title='Trend (8 yrs)', color='darkgreen', linewidth=1.0)
result_seasonal.seasonal.plot(ax=axes[2], title='Seasonality (8 yrs)', color='orange', linewidth=0.5)
result_seasonal.resid.plot(ax=axes[3], title='Residuals (8 yrs)', color='darkred', linewidth=0.5)
plt.tight_layout()
plt.show()
No NaN found in (active eruptions)!
# exit
sys.exit(1)
An exception has occurred, use %tb to see the full traceback. SystemExit: 1
incl. imputation, handling heteroskedasticity, and handling stationarity.
# select the two timeseries
t1_data = eq_ts_yearly_multi.copy()
ts1_identification = 'Earthquakes Count'
ts1_data = impute_timeseries(ts=t1_data['count'], identification=ts1_identification)
t2_data = erupt_ts_yearly_multi.copy()
ts2_identification = 'Starting Eruptions'
ts2_data = impute_timeseries(ts=t2_data['StartingEruptions_count'], identification=ts2_identification)
# handle non-matching values for Box Cox transformations for both timeseries
negative_indices1 = ts1_data[ts1_data <= 0].index
print("\nts1: Non-matching values for Box Cox transformation:", len(negative_indices1))
for idx in negative_indices1:
print(f"Index: {idx}, Value: {ts1_data.loc[idx]}")
#ts1_data = ts1_data.drop(negative_indices1); ts2_data = ts2_data.drop(negative_indices1)
ts1_data.loc[negative_indices1] = 0.000001
negative_indices2 = ts2_data[ts2_data <= 0].index
print("\nts2: Non-matching values for Box Cox transformation:", len(negative_indices2))
for idx in negative_indices2:
print(f"Index: {idx}, Value: {ts2_data.loc[idx]}")
#ts1_data = ts1_data.drop(negative_indices2); ts2_data = ts2_data.drop(negative_indices2)
ts2_data.loc[negative_indices2] = 0.000001
No NaN found in (Earthquakes Count)! No NaN found in (Starting Eruptions)! ts1: Non-matching values for Box Cox transformation: 0 ts2: Non-matching values for Box Cox transformation: 0
# check for heteroskedasticity
try:
display(perform_heteroskedasticity_tests(ts1_data))
except:
print("Error for statistical tests heteroskedasticity for ts1")
try:
display(perform_heteroskedasticity_tests(ts2_data))
except:
print("Error for statistical tests heteroskedasticity for ts2")
# # Manually apply Box-Cox transformation if heteroskedasticity is detected
# lambda_value = 6.5
# transformed_values1 = boxcox(ts1_data, lmbda=lambda_value)
# ts1_transformed = pd.Series(transformed_values1, index=ts1_data.index)
# lambda_value = 13.0
# transformed_values2 = boxcox(ts2_data, lmbda=lambda_value)
# ts2_transformed = pd.Series(transformed_values2, index=ts2_data.index)
# # check again for heteroskedasticity
# display(perform_heteroskedasticity_tests(ts1_transformed))
# display(perform_heteroskedasticity_tests(ts2_transformed))
# use original data if tests are okay or fail
ts1_transformed = ts1_data
ts2_transformed = ts2_data
| Statistic | p-value | Hypothesis | Heteroskedasticity | |
|---|---|---|---|---|
| ARCH | 2.557884 | 0.990005 | FAIL to reject H0 | no |
| Breusch-Pagan | 0.168739 | 0.681235 | FAIL to reject H0 | no |
| White | 1.092538 | 0.579107 | FAIL to reject H0 | no |
| Statistic | p-value | Hypothesis | Heteroskedasticity | |
|---|---|---|---|---|
| ARCH | 9.033038 | 0.528971 | FAIL to reject H0 | no |
| Breusch-Pagan | 0.180992 | 0.670522 | FAIL to reject H0 | no |
| White | 0.448855 | 0.798974 | FAIL to reject H0 | no |
# manual workflow to handle stationarity (and differenciate)
# plot timeseries for checking effects
# ======= ts1 =======
ts1_result_stationarity = check_stationarity(ts1_transformed, ts1_identification)
ts1_transformed.plot();plt.show()
print(f"(1) Differencing {ts1_identification}")
ts1_processed = ts1_transformed.diff().dropna()
ts1_result_stationarity = check_stationarity(ts1_processed, ts1_identification)
ts1_processed.plot();plt.show()
# print(f"(2) Differencing {ts1_identification}")
# ts1_processed = ts1_processed.diff().dropna()
# ts1_result_stationarity = check_stationarity(ts1_processed, ts1_identification)
# ts1_processed.plot();plt.show()
# print(f"(3) Differencing {ts1_identification}")
# ts1_processed = ts1_processed.diff().dropna()
# ts1_result_stationarity = check_stationarity(ts1_processed, ts1_identification)
# ts1_processed.plot();plt.show()
# ======= ts2 =======
print()
ts2_result_stationarity = check_stationarity(ts2_transformed, ts2_identification)
ts2_transformed.plot();plt.show()
print(f"(1) Differencing {ts2_identification}")
ts2_processed = ts2_transformed.diff().dropna()
ts2_result_stationarity = check_stationarity(ts2_processed, ts2_identification)
ts2_processed.plot();plt.show()
# print(f"(2) Differencing {ts2_identification}")
# ts2_processed = ts2_processed.diff().dropna()
# ts2_result_stationarity = check_stationarity(ts2_processed, ts2_identification)
# ts2_processed.plot();plt.show()
# print(f"(3) Differencing {ts2_identification}")
# ts2_processed = ts2_processed.diff().dropna()
# ts2_result_stationarity = check_stationarity(ts2_processed, ts2_identification)
# ts2_processed.plot();plt.show()
Testing stationarity for Earthquakes Count
| Statistic | p-value | Number of Lags | Hypothesis | Stationarity | |
|---|---|---|---|---|---|
| ADF | -5.604654 | 0.000001 | 0 | Reject H0 | yes |
| KPSS | 0.149731 | 0.1 | 2 | FAIL to reject H0 | yes |
(1) Differencing Earthquakes Count Testing stationarity for Earthquakes Count
| Statistic | p-value | Number of Lags | Hypothesis | Stationarity | |
|---|---|---|---|---|---|
| ADF | -7.287851 | 0.0 | 1 | Reject H0 | yes |
| KPSS | 0.278982 | 0.1 | 23 | FAIL to reject H0 | yes |
Testing stationarity for Starting Eruptions
| Statistic | p-value | Number of Lags | Hypothesis | Stationarity | |
|---|---|---|---|---|---|
| ADF | -7.549589 | 0.0 | 0 | Reject H0 | yes |
| KPSS | 0.358913 | 0.094865 | 2 | FAIL to reject H0 | yes |
(1) Differencing Starting Eruptions Testing stationarity for Starting Eruptions
| Statistic | p-value | Number of Lags | Hypothesis | Stationarity | |
|---|---|---|---|---|---|
| ADF | -8.797681 | 0.0 | 2 | Reject H0 | yes |
| KPSS | 0.192489 | 0.1 | 18 | FAIL to reject H0 | yes |
# Call the function with set parameters and different lag-ranges
show_crosscorrelation_results(
ts1=ts1_processed,
ts2=ts2_processed,
ts1_id=ts1_identification,
ts2_id=ts2_identification,
plot_type='stem',
plotwidth=2,
plotsize=4,
top_n=5,
lag_limits=None,
figsize=(15, 6),
show_min_max_table=True,
order_min_max_table=True,
legend_cols=4,
clustername = selected_subdir,
scope = parameters.scope_id,
figure_save_dir=data_ts_dir_cluster
)
# Call the function with set parameters and different lag-ranges
show_crosscorrelation_results(
ts1=ts1_processed,
ts2=ts2_processed,
ts1_id=ts1_identification,
ts2_id=ts2_identification,
plot_type='line',
plotwidth=2,
plotsize=4,
top_n=5,
lag_limits=(-10,10),
figsize=(15, 6),
show_min_max_table=True,
order_min_max_table=True,
legend_cols=4,
clustername = selected_subdir,
scope = parameters.scope_id,
figure_save_dir=data_ts_dir_cluster
)
2025-01-27 19:18:21,528 - INFO - None: 50_CCF_(Earthquakes_Count_Starting_Eruptions)_(A-DEC)_(None).png to './data/scope/reclustered_1000/cluster_Cluster_0'.
CCF min-max values for all lags
| lag | ccf_coeff | ci_lower | ci_upper | sig | rank | label |
|---|---|---|---|---|---|---|
| -5 | -0.152637 | -0.292174 | 0.292174 | 5 | min | |
| 1 | 0.206016 | -0.279995 | 0.279995 | 5 | max | |
| 3 | 0.214521 | -0.285890 | 0.285890 | 4 | max | |
| 4 | -0.471005 | -0.288981 | 0.288981 | sig | 1 | min |
| 5 | 0.426299 | -0.292174 | 0.292174 | sig | 1 | max |
| 6 | -0.320718 | -0.295476 | 0.295476 | sig | 2 | min |
| 7 | 0.279649 | -0.298892 | 0.298892 | 2 | max | |
| 14 | -0.166681 | -0.326661 | 0.326661 | 4 | min | |
| 38 | 0.265381 | -0.565793 | 0.565793 | 3 | max | |
| 39 | -0.225319 | -0.590951 | 0.590951 | 3 | min |
2025-01-27 19:18:22,011 - INFO - None: 50_CCF_(Earthquakes_Count_Starting_Eruptions)_(A-DEC)_(-10_10).png to './data/scope/reclustered_1000/cluster_Cluster_0'.
CCF min-max values for all lags
| lag | ccf_coeff | ci_lower | ci_upper | sig | rank | label |
|---|---|---|---|---|---|---|
| -5 | -0.152637 | -0.292174 | 0.292174 | 5 | min | |
| 1 | 0.206016 | -0.279995 | 0.279995 | 5 | max | |
| 3 | 0.214521 | -0.285890 | 0.285890 | 4 | max | |
| 4 | -0.471005 | -0.288981 | 0.288981 | sig | 1 | min |
| 5 | 0.426299 | -0.292174 | 0.292174 | sig | 1 | max |
| 6 | -0.320718 | -0.295476 | 0.295476 | sig | 2 | min |
| 7 | 0.279649 | -0.298892 | 0.298892 | 2 | max | |
| 14 | -0.166681 | -0.326661 | 0.326661 | 4 | min | |
| 38 | 0.265381 | -0.565793 | 0.565793 | 3 | max | |
| 39 | -0.225319 | -0.590951 | 0.590951 | 3 | min |
# select the two timeseries
t1_data = eq_ts_yearly_multi.copy()
ts1_identification = 'Released Earthquake Energy'
ts1_data = impute_timeseries(ts=t1_data['energy_sum'], identification=ts1_identification)
t2_data = erupt_ts_yearly_multi.copy()
ts2_identification = 'Erupted Volume'
ts2_data = impute_timeseries(ts=t2_data['DailyEruptedVolume_sum'], identification=ts2_identification)
# handle non-matching values for Box Cox transformations for both timeseries
negative_indices1 = ts1_data[ts1_data <= 0].index
print("\nts1: Non-matching values for Box Cox transformation:", len(negative_indices1))
for idx in negative_indices1:
print(f"Index: {idx}, Value: {ts1_data.loc[idx]}")
#ts1_data = ts1_data.drop(negative_indices1); ts2_data = ts2_data.drop(negative_indices1)
ts1_data.loc[negative_indices1] = 0.000001
negative_indices2 = ts2_data[ts2_data <= 0].index
print("\nts2: Non-matching values for Box Cox transformation:", len(negative_indices2))
for idx in negative_indices2:
print(f"Index: {idx}, Value: {ts2_data.loc[idx]}")
#ts1_data = ts1_data.drop(negative_indices2); ts2_data = ts2_data.drop(negative_indices2)
ts2_data.loc[negative_indices2] = 0.000001
No NaN found in (Released Earthquake Energy)! No NaN found in (Erupted Volume)! ts1: Non-matching values for Box Cox transformation: 0 ts2: Non-matching values for Box Cox transformation: 0
# check for heteroskedasticity
try:
display(perform_heteroskedasticity_tests(ts1_data))
except:
print("Error for statistical tests heteroskedasticity for ts1")
try:
display(perform_heteroskedasticity_tests(ts2_data))
except:
print("Error for statistical tests heteroskedasticity for ts2")
# # Manually apply Box-Cox transformation if heteroskedasticity is detected
# lambda_value = 1.5
# transformed_values1 = boxcox(ts1_data, lmbda=lambda_value)
# ts1_transformed = pd.Series(transformed_values1, index=ts1_data.index)
# lambda_value = 1.0
# transformed_values2 = boxcox(ts2_data, lmbda=lambda_value)
# ts2_transformed = pd.Series(transformed_values2, index=ts2_data.index)
# # check again for heteroskedasticity
# display(perform_heteroskedasticity_tests(ts1_transformed))
# display(perform_heteroskedasticity_tests(ts2_transformed))
# use original data if tests are okay or fail
ts1_transformed = ts1_data
ts2_transformed = ts2_data
| Statistic | p-value | Hypothesis | Heteroskedasticity | |
|---|---|---|---|---|
| ARCH | -0.999366 | 1.000000 | FAIL to reject H0 | no |
| Breusch-Pagan | -1.016708 | 1.000000 | FAIL to reject H0 | no |
| White | -1.033809 | nan | FAIL to reject H0 | no |
| Statistic | p-value | Hypothesis | Heteroskedasticity | |
|---|---|---|---|---|
| ARCH | -1.059407 | 1.000000 | FAIL to reject H0 | no |
| Breusch-Pagan | 0.095871 | 0.756842 | FAIL to reject H0 | no |
| White | NaN | nan | failed | failed |
# manual workflow to handle stationarity (and differenciate)
# plot timeseries for checking effects
# ======= ts1 =======
ts1_result_stationarity = check_stationarity(ts1_transformed, ts1_identification)
ts1_transformed.plot();plt.show()
print(f"(1) Differencing {ts1_identification}")
ts1_processed = ts1_transformed.diff().dropna()
ts1_result_stationarity = check_stationarity(ts1_processed, ts1_identification)
ts1_processed.plot();plt.show()
print(f"(2) Differencing {ts1_identification}")
ts1_processed = ts1_processed.diff().dropna()
ts1_result_stationarity = check_stationarity(ts1_processed, ts1_identification)
ts1_processed.plot();plt.show()
print(f"(3) Differencing {ts1_identification}")
ts1_processed = ts1_processed.diff().dropna()
ts1_result_stationarity = check_stationarity(ts1_processed, ts1_identification)
ts1_processed.plot();plt.show()
# ======= ts2 =======
print()
ts2_result_stationarity = check_stationarity(ts2_transformed, ts2_identification)
ts2_transformed.plot();plt.show()
print(f"(1) Differencing {ts2_identification}")
ts2_processed = ts2_transformed.diff().dropna()
ts2_result_stationarity = check_stationarity(ts2_processed, ts2_identification)
ts2_processed.plot();plt.show()
print(f"(2) Differencing {ts2_identification}")
ts2_processed = ts2_processed.diff().dropna()
ts2_result_stationarity = check_stationarity(ts2_processed, ts2_identification)
ts2_processed.plot();plt.show()
print(f"(3) Differencing {ts2_identification}")
ts2_processed = ts2_processed.diff().dropna()
ts2_result_stationarity = check_stationarity(ts2_processed, ts2_identification)
ts2_processed.plot();plt.show()
Testing stationarity for Released Earthquake Energy
| Statistic | p-value | Number of Lags | Hypothesis | Stationarity | |
|---|---|---|---|---|---|
| ADF | -6.721473 | 0.0 | 0 | Reject H0 | yes |
| KPSS | 0.191778 | 0.1 | 1 | FAIL to reject H0 | yes |
(1) Differencing Released Earthquake Energy Testing stationarity for Released Earthquake Energy
| Statistic | p-value | Number of Lags | Hypothesis | Stationarity | |
|---|---|---|---|---|---|
| ADF | -6.591983 | 0.0 | 2 | Reject H0 | yes |
| KPSS | 0.5 | 0.041667 | 48 | Reject H0 | no |
(2) Differencing Released Earthquake Energy Testing stationarity for Released Earthquake Energy
| Statistic | p-value | Number of Lags | Hypothesis | Stationarity | |
|---|---|---|---|---|---|
| ADF | -3.505297 | 0.007852 | 7 | Reject H0 | yes |
| KPSS | 0.5 | 0.041667 | 47 | Reject H0 | no |
(3) Differencing Released Earthquake Energy Testing stationarity for Released Earthquake Energy
| Statistic | p-value | Number of Lags | Hypothesis | Stationarity | |
|---|---|---|---|---|---|
| ADF | -6.478318 | 0.0 | 6 | Reject H0 | yes |
| KPSS | 0.093817 | 0.1 | 8 | FAIL to reject H0 | yes |
Testing stationarity for Erupted Volume
| Statistic | p-value | Number of Lags | Hypothesis | Stationarity | |
|---|---|---|---|---|---|
| ADF | -6.818354 | 0.0 | 0 | Reject H0 | yes |
| KPSS | 0.087075 | 0.1 | 1 | FAIL to reject H0 | yes |
(1) Differencing Erupted Volume Testing stationarity for Erupted Volume
| Statistic | p-value | Number of Lags | Hypothesis | Stationarity | |
|---|---|---|---|---|---|
| ADF | -6.386663 | 0.0 | 2 | Reject H0 | yes |
| KPSS | 0.5 | 0.041667 | 48 | Reject H0 | no |
(2) Differencing Erupted Volume Testing stationarity for Erupted Volume
| Statistic | p-value | Number of Lags | Hypothesis | Stationarity | |
|---|---|---|---|---|---|
| ADF | -4.601648 | 0.000128 | 7 | Reject H0 | yes |
| KPSS | 0.5 | 0.041667 | 47 | Reject H0 | no |
(3) Differencing Erupted Volume Testing stationarity for Erupted Volume
| Statistic | p-value | Number of Lags | Hypothesis | Stationarity | |
|---|---|---|---|---|---|
| ADF | -4.347204 | 0.000368 | 9 | Reject H0 | yes |
| KPSS | 0.125257 | 0.1 | 11 | FAIL to reject H0 | yes |
# Call the function with set parameters and different lag-ranges
show_crosscorrelation_results(
ts1=ts1_processed,
ts2=ts2_processed,
ts1_id=ts1_identification,
ts2_id=ts2_identification,
plot_type='stem',
plotwidth=2,
plotsize=4,
top_n=5,
lag_limits=None,
figsize=(16, 6),
show_min_max_table=True,
order_min_max_table=True,
legend_cols=4,
fft_highpower_lags=None,
show_fft_highpower_lags_analysis=False,
clustername = selected_subdir,
scope = parameters.scope_id,
figure_save_dir=data_ts_dir_cluster
)
# Call the function with set parameters and different lag-ranges
show_crosscorrelation_results(
ts1=ts1_processed,
ts2=ts2_processed,
ts1_id=ts1_identification,
ts2_id=ts2_identification,
plot_type='line',
plotwidth=2,
plotsize=4,
top_n=5,
lag_limits=(-0,30),
figsize=(16, 6),
show_min_max_table=True,
order_min_max_table=True,
legend_cols=4,
fft_highpower_lags=None,
show_fft_highpower_lags_analysis=False,
clustername = selected_subdir,
scope = parameters.scope_id,
figure_save_dir=data_ts_dir_cluster
)
2025-01-27 19:20:38,935 - INFO - None: 50_CCF_(Released_Earthquake_Energy_Erupted_Volume)_(A-DEC)_(None).png to './data/scope/reclustered_1000/cluster_Cluster_0'.
CCF min-max values for all lags
| lag | ccf_coeff | ci_lower | ci_upper | sig | rank | label |
|---|---|---|---|---|---|---|
| 1 | 0.034592 | -0.285890 | 0.285890 | 5 | max | |
| 18 | 0.291403 | -0.357839 | 0.357839 | 2 | max | |
| 19 | -0.744443 | -0.363956 | 0.363956 | sig | 1 | min |
| 20 | 0.995833 | -0.370398 | 0.370398 | sig | 1 | max |
| 21 | -0.738971 | -0.377195 | 0.377195 | sig | 2 | min |
| 22 | 0.287173 | -0.384381 | 0.384381 | 3 | max | |
| 23 | -0.045411 | -0.391993 | 0.391993 | 4 | min | |
| 27 | -0.044001 | -0.427699 | 0.427699 | 5 | min | |
| 28 | 0.069461 | -0.438261 | 0.438261 | 4 | max | |
| 29 | -0.047607 | -0.449647 | 0.449647 | 3 | min |
2025-01-27 19:20:39,413 - INFO - None: 50_CCF_(Released_Earthquake_Energy_Erupted_Volume)_(A-DEC)_(0_30).png to './data/scope/reclustered_1000/cluster_Cluster_0'.
CCF min-max values for all lags
| lag | ccf_coeff | ci_lower | ci_upper | sig | rank | label |
|---|---|---|---|---|---|---|
| 1 | 0.034592 | -0.285890 | 0.285890 | 5 | max | |
| 18 | 0.291403 | -0.357839 | 0.357839 | 2 | max | |
| 19 | -0.744443 | -0.363956 | 0.363956 | sig | 1 | min |
| 20 | 0.995833 | -0.370398 | 0.370398 | sig | 1 | max |
| 21 | -0.738971 | -0.377195 | 0.377195 | sig | 2 | min |
| 22 | 0.287173 | -0.384381 | 0.384381 | 3 | max | |
| 23 | -0.045411 | -0.391993 | 0.391993 | 4 | min | |
| 27 | -0.044001 | -0.427699 | 0.427699 | 5 | min | |
| 28 | 0.069461 | -0.438261 | 0.438261 | 4 | max | |
| 29 | -0.047607 | -0.449647 | 0.449647 | 3 | min |
# select the two timeseries
t1_data = eq_ts_yearly_multi.copy()
ts1_identification = 'Released Earthquake Energy'
ts1_data = impute_timeseries(ts=t1_data['energy_sum'], identification=ts1_identification)
t2_data = erupt_ts_yearly_multi.copy()
ts2_identification = 'Starting Eruptions'
ts2_data = impute_timeseries(ts=t2_data['StartingEruptions_count'], identification=ts2_identification)
# handle non-matching values for Box Cox transformations for both timeseries
negative_indices1 = ts1_data[ts1_data <= 0].index
print("\nts1: Non-matching values for Box Cox transformation:", len(negative_indices1))
for idx in negative_indices1:
print(f"Index: {idx}, Value: {ts1_data.loc[idx]}")
#ts1_data = ts1_data.drop(negative_indices1); ts2_data = ts2_data.drop(negative_indices1)
ts1_data.loc[negative_indices1] = 0.000001
negative_indices2 = ts2_data[ts2_data <= 0].index
print("\nts2: Non-matching values for Box Cox transformation:", len(negative_indices2))
for idx in negative_indices2:
print(f"Index: {idx}, Value: {ts2_data.loc[idx]}")
#ts1_data = ts1_data.drop(negative_indices2); ts2_data = ts2_data.drop(negative_indices2)
ts2_data.loc[negative_indices2] = 0.000001
No NaN found in (Released Earthquake Energy)! No NaN found in (Starting Eruptions)! ts1: Non-matching values for Box Cox transformation: 0 ts2: Non-matching values for Box Cox transformation: 0
# check for heteroskedasticity
try:
display(perform_heteroskedasticity_tests(ts1_data))
except:
print("Error for statistical tests heteroskedasticity for ts1")
try:
display(perform_heteroskedasticity_tests(ts2_data))
except:
print("Error for statistical tests heteroskedasticity for ts2")
# # Manually apply Box-Cox transformation if heteroskedasticity is detected
# lambda_value = 5.0
# transformed_values1 = boxcox(ts1_data, lmbda=lambda_value)
# ts1_transformed = pd.Series(transformed_values1, index=ts1_data.index)
# ts2_transformed = ts2_data
# lambda_value = 13.0
# transformed_values2 = boxcox(ts2_data, lmbda=lambda_value)
# ts2_transformed = pd.Series(transformed_values2, index=ts2_data.index)
# # check again for heteroskedasticity
# display(perform_heteroskedasticity_tests(ts1_transformed))
# display(perform_heteroskedasticity_tests(ts2_transformed))
# use original data if tests are okay or fail
ts1_transformed = ts1_data
ts2_transformed = ts2_data
| Statistic | p-value | Hypothesis | Heteroskedasticity | |
|---|---|---|---|---|
| ARCH | -0.999366 | 1.000000 | FAIL to reject H0 | no |
| Breusch-Pagan | -1.016708 | 1.000000 | FAIL to reject H0 | no |
| White | -1.033809 | nan | FAIL to reject H0 | no |
| Statistic | p-value | Hypothesis | Heteroskedasticity | |
|---|---|---|---|---|
| ARCH | 9.033038 | 0.528971 | FAIL to reject H0 | no |
| Breusch-Pagan | 0.180992 | 0.670522 | FAIL to reject H0 | no |
| White | 0.448855 | 0.798974 | FAIL to reject H0 | no |
# manual workflow to handle stationarity (and differenciate)
# plot timeseries for checking effects
# ======= ts1 =======
ts1_result_stationarity = check_stationarity(ts1_transformed, ts1_identification)
ts1_transformed.plot();plt.show()
print(f"(1) Differencing {ts1_identification}")
ts1_processed = ts1_transformed.diff().dropna()
ts1_result_stationarity = check_stationarity(ts1_processed, ts1_identification)
ts1_processed.plot();plt.show()
print(f"(2) Differencing {ts1_identification}")
ts1_processed = ts1_processed.diff().dropna()
ts1_result_stationarity = check_stationarity(ts1_processed, ts1_identification)
ts1_processed.plot();plt.show()
print(f"(3) Differencing {ts1_identification}")
ts1_processed = ts1_processed.diff().dropna()
ts1_result_stationarity = check_stationarity(ts1_processed, ts1_identification)
ts1_processed.plot();plt.show()
# ======= ts2 =======
print()
ts2_result_stationarity = check_stationarity(ts2_transformed, ts2_identification)
ts2_transformed.plot();plt.show()
print(f"(1) Differencing {ts2_identification}")
ts2_processed = ts2_transformed.diff().dropna()
ts2_result_stationarity = check_stationarity(ts2_processed, ts2_identification)
ts2_processed.plot();plt.show()
print(f"(2) Differencing {ts2_identification}")
ts2_processed = ts2_processed.diff().dropna()
ts2_result_stationarity = check_stationarity(ts2_processed, ts2_identification)
ts2_processed.plot();plt.show()
print(f"(3) Differencing {ts2_identification}")
ts2_processed = ts2_processed.diff().dropna()
ts2_result_stationarity = check_stationarity(ts2_processed, ts2_identification)
ts2_processed.plot();plt.show()
Testing stationarity for Released Earthquake Energy
| Statistic | p-value | Number of Lags | Hypothesis | Stationarity | |
|---|---|---|---|---|---|
| ADF | -6.721473 | 0.0 | 0 | Reject H0 | yes |
| KPSS | 0.191778 | 0.1 | 1 | FAIL to reject H0 | yes |
(1) Differencing Released Earthquake Energy Testing stationarity for Released Earthquake Energy
| Statistic | p-value | Number of Lags | Hypothesis | Stationarity | |
|---|---|---|---|---|---|
| ADF | -6.591983 | 0.0 | 2 | Reject H0 | yes |
| KPSS | 0.5 | 0.041667 | 48 | Reject H0 | no |
(2) Differencing Released Earthquake Energy Testing stationarity for Released Earthquake Energy
| Statistic | p-value | Number of Lags | Hypothesis | Stationarity | |
|---|---|---|---|---|---|
| ADF | -3.505297 | 0.007852 | 7 | Reject H0 | yes |
| KPSS | 0.5 | 0.041667 | 47 | Reject H0 | no |
(3) Differencing Released Earthquake Energy Testing stationarity for Released Earthquake Energy
| Statistic | p-value | Number of Lags | Hypothesis | Stationarity | |
|---|---|---|---|---|---|
| ADF | -6.478318 | 0.0 | 6 | Reject H0 | yes |
| KPSS | 0.093817 | 0.1 | 8 | FAIL to reject H0 | yes |
Testing stationarity for Starting Eruptions
| Statistic | p-value | Number of Lags | Hypothesis | Stationarity | |
|---|---|---|---|---|---|
| ADF | -7.549589 | 0.0 | 0 | Reject H0 | yes |
| KPSS | 0.358913 | 0.094865 | 2 | FAIL to reject H0 | yes |
(1) Differencing Starting Eruptions Testing stationarity for Starting Eruptions
| Statistic | p-value | Number of Lags | Hypothesis | Stationarity | |
|---|---|---|---|---|---|
| ADF | -8.797681 | 0.0 | 2 | Reject H0 | yes |
| KPSS | 0.192489 | 0.1 | 18 | FAIL to reject H0 | yes |
(2) Differencing Starting Eruptions Testing stationarity for Starting Eruptions
| Statistic | p-value | Number of Lags | Hypothesis | Stationarity | |
|---|---|---|---|---|---|
| ADF | -6.514636 | 0.0 | 5 | Reject H0 | yes |
| KPSS | 0.058254 | 0.1 | 2 | FAIL to reject H0 | yes |
(3) Differencing Starting Eruptions Testing stationarity for Starting Eruptions
| Statistic | p-value | Number of Lags | Hypothesis | Stationarity | |
|---|---|---|---|---|---|
| ADF | -7.32307 | 0.0 | 6 | Reject H0 | yes |
| KPSS | 0.049038 | 0.1 | 4 | FAIL to reject H0 | yes |
# Call the function with set parameters and different lag-ranges
show_crosscorrelation_results(
ts1=ts1_processed,
ts2=ts2_processed,
ts1_id=ts1_identification,
ts2_id=ts2_identification,
plot_type='stem',
plotwidth=2,
plotsize=4,
top_n=5,
lag_limits=None,
figsize=(16, 6),
show_min_max_table=True,
order_min_max_table=True,
legend_cols=4,
fft_highpower_lags=None,
show_fft_highpower_lags_analysis=False,
clustername = selected_subdir,
scope = parameters.scope_id,
figure_save_dir=data_ts_dir_cluster
)
# Call the function with set parameters and different lag-ranges
show_crosscorrelation_results(
ts1=ts1_processed,
ts2=ts2_processed,
ts1_id=ts1_identification,
ts2_id=ts2_identification,
plot_type='line',
plotwidth=2,
plotsize=4,
top_n=5,
lag_limits=(-20,20),
figsize=(16, 6),
show_min_max_table=True,
order_min_max_table=True,
legend_cols=4,
fft_highpower_lags=None,
show_fft_highpower_lags_analysis=False,
clustername = selected_subdir,
scope = parameters.scope_id,
figure_save_dir=data_ts_dir_cluster
)
2025-01-27 19:21:48,152 - INFO - None: 50_CCF_(Released_Earthquake_Energy_Starting_Eruptions)_(A-DEC)_(None).png to './data/scope/reclustered_1000/cluster_Cluster_0'.
CCF min-max values for all lags
| lag | ccf_coeff | ci_lower | ci_upper | sig | rank | label |
|---|---|---|---|---|---|---|
| 0 | -0.201734 | -0.285890 | 0.285890 | 5 | min | |
| 1 | 0.287434 | -0.285890 | 0.285890 | sig | 4 | max |
| 2 | -0.345177 | -0.288981 | 0.288981 | sig | 3 | min |
| 3 | 0.493911 | -0.292174 | 0.292174 | sig | 2 | max |
| 4 | -0.686430 | -0.295476 | 0.295476 | sig | 1 | min |
| 5 | 0.736140 | -0.298892 | 0.298892 | sig | 1 | max |
| 6 | -0.596008 | -0.302429 | 0.302429 | sig | 2 | min |
| 7 | 0.356325 | -0.306095 | 0.306095 | sig | 3 | max |
| 38 | 0.240506 | -0.619795 | 0.619795 | 5 | max | |
| 39 | -0.224911 | -0.653321 | 0.653321 | 4 | min |
2025-01-27 19:21:48,634 - INFO - None: 50_CCF_(Released_Earthquake_Energy_Starting_Eruptions)_(A-DEC)_(-20_20).png to './data/scope/reclustered_1000/cluster_Cluster_0'.
CCF min-max values for all lags
| lag | ccf_coeff | ci_lower | ci_upper | sig | rank | label |
|---|---|---|---|---|---|---|
| 0 | -0.201734 | -0.285890 | 0.285890 | 5 | min | |
| 1 | 0.287434 | -0.285890 | 0.285890 | sig | 4 | max |
| 2 | -0.345177 | -0.288981 | 0.288981 | sig | 3 | min |
| 3 | 0.493911 | -0.292174 | 0.292174 | sig | 2 | max |
| 4 | -0.686430 | -0.295476 | 0.295476 | sig | 1 | min |
| 5 | 0.736140 | -0.298892 | 0.298892 | sig | 1 | max |
| 6 | -0.596008 | -0.302429 | 0.302429 | sig | 2 | min |
| 7 | 0.356325 | -0.306095 | 0.306095 | sig | 3 | max |
| 38 | 0.240506 | -0.619795 | 0.619795 | 5 | max | |
| 39 | -0.224911 | -0.653321 | 0.653321 | 4 | min |
# select the two timeseries
t1_data = eq_ts_yearly_multi.copy()
ts1_identification = 'Max Earthquake Magnitude'
ts1_data = impute_timeseries(ts=t1_data['energy_max'], identification=ts1_identification)
t2_data = erupt_ts_yearly_multi.copy()
ts2_identification = 'Max VEI'
ts2_data = impute_timeseries(ts=t2_data['ExplosivityIndexMax_max'], identification=ts2_identification)
# handle non-matching values for Box Cox transformations for both timeseries
negative_indices1 = ts1_data[ts1_data <= 0].index
print("\nts1: Non-matching values for Box Cox transformation:", len(negative_indices1))
for idx in negative_indices1:
print(f"Index: {idx}, Value: {ts1_data.loc[idx]}")
#ts1_data = ts1_data.drop(negative_indices1); ts2_data = ts2_data.drop(negative_indices1)
ts1_data.loc[negative_indices1] = 0.000001
negative_indices2 = ts2_data[ts2_data <= 0].index
print("\nts2: Non-matching values for Box Cox transformation:", len(negative_indices2))
for idx in negative_indices2:
print(f"Index: {idx}, Value: {ts2_data.loc[idx]}")
#ts1_data = ts1_data.drop(negative_indices2); ts2_data = ts2_data.drop(negative_indices2)
ts2_data.loc[negative_indices2] = 0.000001
No NaN found in (Max Earthquake Magnitude)! No NaN found in (Max VEI)! ts1: Non-matching values for Box Cox transformation: 0 ts2: Non-matching values for Box Cox transformation: 0
# check for heteroskedasticity
try:
display(perform_heteroskedasticity_tests(ts1_data))
except:
print("Error for statistical tests heteroskedasticity for ts1")
try:
display(perform_heteroskedasticity_tests(ts2_data))
except:
print("Error for statistical tests heteroskedasticity for ts2")
# # Manually apply Box-Cox transformation if heteroskedasticity is detected
# lambda_value = 4.5
# transformed_values1 = boxcox(ts1_data, lmbda=lambda_value)
# ts1_transformed = pd.Series(transformed_values1, index=ts1_data.index)
# lambda_value = 3.0
# transformed_values2 = boxcox(ts2_data, lmbda=lambda_value)
# ts2_transformed = pd.Series(transformed_values2, index=ts2_data.index)
# # check again for heteroskedasticity
# display(perform_heteroskedasticity_tests(ts1_transformed))
# display(perform_heteroskedasticity_tests(ts2_transformed))
# use original data if tests are okay or fail
ts1_transformed = ts1_data
ts2_transformed = ts2_data
| Statistic | p-value | Hypothesis | Heteroskedasticity | |
|---|---|---|---|---|
| ARCH | -0.980649 | 1.000000 | FAIL to reject H0 | no |
| Breusch-Pagan | -1.025735 | 1.000000 | FAIL to reject H0 | no |
| White | -1.027389 | nan | FAIL to reject H0 | no |
| Statistic | p-value | Hypothesis | Heteroskedasticity | |
|---|---|---|---|---|
| ARCH | 8.732857 | 0.557630 | FAIL to reject H0 | no |
| Breusch-Pagan | 0.016923 | 0.896498 | FAIL to reject H0 | no |
| White | 0.179042 | 0.914369 | FAIL to reject H0 | no |
# manual workflow to handle stationarity (and differenciate)
# plot timeseries for checking effects
# ======= ts1 =======
ts1_result_stationarity = check_stationarity(ts1_transformed, ts1_identification)
ts1_transformed.plot();plt.show()
print(f"(1) Differencing {ts1_identification}")
ts1_processed = ts1_transformed.diff().dropna()
ts1_result_stationarity = check_stationarity(ts1_processed, ts1_identification)
ts1_processed.plot();plt.show()
print(f"(2) Differencing {ts1_identification}")
ts1_processed = ts1_processed.diff().dropna()
ts1_result_stationarity = check_stationarity(ts1_processed, ts1_identification)
ts1_processed.plot();plt.show()
print(f"(3) Differencing {ts1_identification}")
ts1_processed = ts1_processed.diff().dropna()
ts1_result_stationarity = check_stationarity(ts1_processed, ts1_identification)
ts1_processed.plot();plt.show()
# ======= ts2 =======
print()
ts2_result_stationarity = check_stationarity(ts2_transformed, ts2_identification)
ts2_transformed.plot();plt.show()
print(f"(1) Differencing {ts2_identification}")
ts2_processed = ts2_transformed.diff().dropna()
ts2_result_stationarity = check_stationarity(ts2_processed, ts2_identification)
ts2_processed.plot();plt.show()
print(f"(2) Differencing {ts2_identification}")
ts2_processed = ts2_processed.diff().dropna()
ts2_result_stationarity = check_stationarity(ts2_processed, ts2_identification)
ts2_processed.plot();plt.show()
print(f"(3) Differencing {ts2_identification}")
ts2_processed = ts2_processed.diff().dropna()
ts2_result_stationarity = check_stationarity(ts2_processed, ts2_identification)
ts2_processed.plot();plt.show()
Testing stationarity for Max Earthquake Magnitude
| Statistic | p-value | Number of Lags | Hypothesis | Stationarity | |
|---|---|---|---|---|---|
| ADF | -6.843847 | 0.0 | 0 | Reject H0 | yes |
| KPSS | 0.198282 | 0.1 | 1 | FAIL to reject H0 | yes |
(1) Differencing Max Earthquake Magnitude Testing stationarity for Max Earthquake Magnitude
| Statistic | p-value | Number of Lags | Hypothesis | Stationarity | |
|---|---|---|---|---|---|
| ADF | -6.611468 | 0.0 | 2 | Reject H0 | yes |
| KPSS | 0.5 | 0.041667 | 48 | Reject H0 | no |
(2) Differencing Max Earthquake Magnitude Testing stationarity for Max Earthquake Magnitude
| Statistic | p-value | Number of Lags | Hypothesis | Stationarity | |
|---|---|---|---|---|---|
| ADF | -3.982198 | 0.001507 | 7 | Reject H0 | yes |
| KPSS | 0.5 | 0.041667 | 47 | Reject H0 | no |
(3) Differencing Max Earthquake Magnitude Testing stationarity for Max Earthquake Magnitude
| Statistic | p-value | Number of Lags | Hypothesis | Stationarity | |
|---|---|---|---|---|---|
| ADF | -3.16383 | 0.022176 | 10 | Reject H0 | yes |
| KPSS | 0.093663 | 0.1 | 8 | FAIL to reject H0 | yes |
Testing stationarity for Max VEI
| Statistic | p-value | Number of Lags | Hypothesis | Stationarity | |
|---|---|---|---|---|---|
| ADF | -4.060245 | 0.001126 | 5 | Reject H0 | yes |
| KPSS | 0.106729 | 0.1 | 1 | FAIL to reject H0 | yes |
(1) Differencing Max VEI Testing stationarity for Max VEI
| Statistic | p-value | Number of Lags | Hypothesis | Stationarity | |
|---|---|---|---|---|---|
| ADF | -3.991679 | 0.001455 | 8 | Reject H0 | yes |
| KPSS | 0.19898 | 0.1 | 17 | FAIL to reject H0 | yes |
(2) Differencing Max VEI Testing stationarity for Max VEI
| Statistic | p-value | Number of Lags | Hypothesis | Stationarity | |
|---|---|---|---|---|---|
| ADF | -7.792373 | 0.0 | 3 | Reject H0 | yes |
| KPSS | 0.5 | 0.041667 | 47 | Reject H0 | no |
(3) Differencing Max VEI Testing stationarity for Max VEI
| Statistic | p-value | Number of Lags | Hypothesis | Stationarity | |
|---|---|---|---|---|---|
| ADF | -4.314077 | 0.00042 | 8 | Reject H0 | yes |
| KPSS | 0.190466 | 0.1 | 14 | FAIL to reject H0 | yes |
# Call the function with set parameters and different lag-ranges
show_crosscorrelation_results(
ts1=ts1_processed,
ts2=ts2_processed,
ts1_id=ts1_identification,
ts2_id=ts2_identification,
plot_type='stem',
plotwidth=2,
plotsize=4,
top_n=5,
lag_limits=None,
figsize=(16, 6),
show_min_max_table=True,
order_min_max_table=True,
legend_cols=4,
fft_highpower_lags=None,
show_fft_highpower_lags_analysis=False,
clustername = selected_subdir,
scope = parameters.scope_id,
figure_save_dir=data_ts_dir_cluster
)
# Call the function with set parameters and different lag-ranges
show_crosscorrelation_results(
ts1=ts1_processed,
ts2=ts2_processed,
ts1_id=ts1_identification,
ts2_id=ts2_identification,
plot_type='line',
plotwidth=2,
plotsize=4,
top_n=5,
lag_limits=(-0,40),
figsize=(16, 6),
show_min_max_table=True,
order_min_max_table=True,
legend_cols=4,
fft_highpower_lags=None,
show_fft_highpower_lags_analysis=False,
clustername = selected_subdir,
scope = parameters.scope_id,
figure_save_dir=data_ts_dir_cluster
)
2025-01-27 19:23:18,976 - INFO - None: 50_CCF_(Max_Earthquake_Magnitude_Max_VEI)_(A-DEC)_(None).png to './data/scope/reclustered_1000/cluster_Cluster_0'.
CCF min-max values for all lags
| lag | ccf_coeff | ci_lower | ci_upper | sig | rank | label |
|---|---|---|---|---|---|---|
| 15 | 0.257307 | -0.341186 | 0.341186 | 4 | max | |
| 16 | -0.234054 | -0.346476 | 0.346476 | 5 | min | |
| 19 | -0.519117 | -0.363956 | 0.363956 | sig | 2 | min |
| 20 | 0.712759 | -0.370398 | 0.370398 | sig | 1 | max |
| 21 | -0.542875 | -0.377195 | 0.377195 | sig | 1 | min |
| 22 | 0.220716 | -0.384381 | 0.384381 | 5 | max | |
| 29 | -0.352990 | -0.449647 | 0.449647 | 4 | min | |
| 30 | 0.449033 | -0.461968 | 0.461968 | 2 | max | |
| 31 | -0.403789 | -0.475361 | 0.475361 | 3 | min | |
| 32 | 0.281834 | -0.489991 | 0.489991 | 3 | max |
2025-01-27 19:23:19,474 - INFO - None: 50_CCF_(Max_Earthquake_Magnitude_Max_VEI)_(A-DEC)_(0_40).png to './data/scope/reclustered_1000/cluster_Cluster_0'.
CCF min-max values for all lags
| lag | ccf_coeff | ci_lower | ci_upper | sig | rank | label |
|---|---|---|---|---|---|---|
| 15 | 0.257307 | -0.341186 | 0.341186 | 4 | max | |
| 16 | -0.234054 | -0.346476 | 0.346476 | 5 | min | |
| 19 | -0.519117 | -0.363956 | 0.363956 | sig | 2 | min |
| 20 | 0.712759 | -0.370398 | 0.370398 | sig | 1 | max |
| 21 | -0.542875 | -0.377195 | 0.377195 | sig | 1 | min |
| 22 | 0.220716 | -0.384381 | 0.384381 | 5 | max | |
| 29 | -0.352990 | -0.449647 | 0.449647 | 4 | min | |
| 30 | 0.449033 | -0.461968 | 0.461968 | 2 | max | |
| 31 | -0.403789 | -0.475361 | 0.475361 | 3 | min | |
| 32 | 0.281834 | -0.489991 | 0.489991 | 3 | max |
# log the end of this script
logging.info(f"End of script '{script_name}'.")
%%js
// -------------------------------------------
// What are the headings of this document?
// --> Get a better overview of the structure.
// -------------------------------------------
function listHeadings() {
let headings = [];
let cells = Jupyter.notebook.get_cells();
cells.forEach((cell) => {
if (cell.cell_type == 'markdown') {
let text = cell.get_text();
let lines = text.split('\n');
lines.forEach((line) => {
let match = line.match(/^(#+)\s+(.*)/);
if (match) {
headings.push({
level: match[1].length, // Number of # symbols indicates the heading level
text: match[2].trim()
});
}
});
}
});
return headings;
}
let headings = listHeadings();
headings.forEach((heading) => {
let markdown = `${'#'.repeat(heading.level)} ${heading.text}`;
element.append(`${markdown}<br>`);
});
# save notebook as a experimentation protocol
time.sleep(2)
display(Javascript('IPython.notebook.save_checkpoint();'))
time.sleep(2)
save_notebook_as_html(script_name, data_dir_cluster)